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Abstract 

We study the problem of parameter estimation for time-series possessing two, 
widely separated, characteristic time scales. The aim is to understand situations 
where it is desirable to fit a homogenized singlescale model to such multiscale 
data. We demonstrate, numerically and analytically, that if the data is sampled too 
finely then the parameter fit will fail, in that the correct parameters in the homog- 
enized model are not identified. We also show, numerically and analytically, that 
if the data is subsampled at an appropriate rate then it is possible to estimate the 
coefficients of the homogenized model correctly. 

Keywords: Parameter estimation, multiscale diffusions, stochastic differential equa- 
tions, homogenization, maximum likelihood, subsampling. 



1 Introduction 

Parameter estimation for continuous time stochastic models is an increasingly impor- 
tant part of the overall modelling strategy in a wide variety of applications. It is quite 
often the case that the data to be fitted to a diffusion process has a multiscale char- 
acter. One example is the field of molecular dynamics, where it is desirable to find 
effective models for low dimensional phenomena (such as conformational dynamics, 
vacancy diffusion and so forth) which are embedded within higher dimensional time- 
series. Another example is the ocean-atmosphere sciences where it is desirable to 
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find effective models for large-scale structures, whilst representing the small-scales 
stochastically. The multiscale structure of the data in these problems renders the prob- 
lem of parameter estimation very subtle, and great care has to be taken in order to 
estimate the coefficients correctly. The aim of the paper is to shed light on this estima- 
tion problem through the study of a simple class of model problems, typical of those 
arising in molecular dynamics. 

In econometrics and finance, the problem of estimating parameters for continuous 
time diffusion processes in the presence of small scale fluctuations (market microstruc- 
ture noise) has been considered by Ait-Sahalia and collaborators 1 1 2| and more re- 
cently in 1 3 1 . In that work the microscale is input as an independent white observational 
noise that is superimposed on-top of a singlescale diffusion process. We have a some- 
what different framework: we work in the context of coupled systems of diffusions 
exhibiting multiple scales. Our aim is to fit a singlescale homogenized diffusion to 
data. Models similar to the ones considered in this paper have been studied extensively 
in finance, see 1 12 1 and the reference therein. In that book there is discussion of param- 
eter estimation for multiscale diffusions, with emphasis on the estimation of the rate of 
mean reversion of volatility from historical asset price data; see 1 12 Ch. 4]. 

Various numerical algorithms for diffusions with multiple scales have been devel- 
oped 1 24 1 and analyzed j 10|. Those papers are finely honed to optimize the fitting of 
the homogenized diffusion in situations where the multiscale model is known explic- 
itly. In contrast, in this paper we introduce multiscale diffusions primarily as a device 
to generate multiscale data; we do not assume that the multiscale model is available to 
us when doing parameter estimation. This enables us to gain understanding of param- 
eter estimation in situations where the multiscale data is given to us from experiments, 
or comes from a model where the scale-separation is not explicit. Two recent papers 
contain numerical experiments relating to the extraction of averaged or homogenized 
diffusions from data generated by a multiscale diffusion; see |6 9 1. 

Despite differences from the framework used in 1 1 2 3 1 to study problems arising 
in econometrics and finance, similarities with our work remain: trying to fit the models 
on the basis of data sampled at too high a frequency leads to incorrect parameter infer- 
ence; furthermore, there is an optimal subsampling rate for the data to obtain correct 
inference. 

There are two forms of multiscale diffusions which are of particular interest in the 
context of parameter estimation. The first gives rise to averaging for SDEs, and the 
second to homogenization for SDEs. For averaging one has, for e ^ 1, 

dx^t) = f{x^t),y'{t))dt + aix%t),y'{t))dU{t), (1.1a) 

dy^t) - -g{x'{t),y'{t))dt+^f3{x'{t),y^t))dV{t), (1.1b) 

with U, V standard Brownian motions. Averaging / and aa'^ over the invariant mea- 
sure of the 2/*^ equation, with viewed as fixed, gives an averaged SDE for x. The fast 
process y, with timescale e, is eliminated. For homogenization one has 

dx'it) = (^-Joix'{t),y'{t)) + h{x'it),y'{t))^dt 

+ a{x''{t),y%t))dU{t), (1.2a) 
dyHt) - \gix'{t),y'it))dt + -(3{x'{t),y^t))dV{t), (1.2b) 

where it is assumed that /o averages to zero against the invariant measure of the fast 
process j/*^ with x"^ fixed. Now j/*^ has time-scale and is eliminated. The fluctuations 
in /o, suitably amplified by e~^, induce 0(1) effects in the homogenized equation 
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for x'^ . In both cases Jl.H and J1.2> it is possible to show fsl that the process x'^(i) 
converges in law, as e ^ 0, to the solution of an effective SDE of the form 

dx{t) = F{x{t))dt + A{x{t))dU{t). (1.3) 

Explicit formulae can be derived for the effective coefficients F{x) and A{x) in the 
above equ ation |5 22 1. A natural question that arises then is how to fit an SD E of the 
form ( II. 3> to data generated by a multiscale stochastic equation of the form jl.U or 
il.2\ . under the assumption of scale separation, i.e. when e <C 1. This paper is a first 
attempt t owar ds the study of this interesting problem, for a specific class of SDEs of 
the form il.2\ . 

Our basic model will be the first order Langevin equation 

dx'{t) = -VV (x'{t),^^^;a] dt + V2^df3{t), (1.4) 



where /3{t) denotes standard Brownian motion on R*^ and cr is a positive constant. The 
two-scale potential (x, y; a) is assumed to consist of a large-scale and a fluctuating 
part 

V{x,y;a)^aV{x)+p{y). (1.5) 



As we show explicitly in i5.3\ this set-up puts us in the framework of homogenization 
for SDEs. 

Under ([T3}, the SDE becomes 

dx%t)=-a\/V{x'{t))dt-^'^p(^^^-^^ dt + V2^dP{t). (1.6) 

If p is periodic on T'^ and sufficiently smoot h, the n it is well known (see fS" "211 for 
example) that, as e ^ 0, the solution x'^{t) of (II. 4> converges in law to the solution of 
tiie SDE 

dx{t) ^ ~aKWV{x{t))dt + V2aKd/3{t), (1.7) 

with 

f iI + Vy(b{y))iI + Vycj,{y)f t,{dy) (1.8) 



and 



fi{dy) - p{y)dy = Ig-^^^)/^ dy, Z= f e'^^y^'^ dy. (1.9) 

Zj Jfd 

The field <^(y) is the solution of the Poisson equation 

-£o0(y) = -V.yp(y), £0 yV{v) ■ + (1-10) 

with periodic boundary conditions. The function p{y) spans the null-space of Cq, the 
L^-adjoint of Cq. The effective diffusion tensor is positive definite and the diffusivity 
is always depleted |20|. Physically this occurs because the homogenized process must 
represent the cost of traversing the many small energy barriers present in the original 
multiscale problem but which are not explicitly captured in the homogenized potential. 
In Figure^we plot the potential V'^{x,x/e), as well as the average potential V{x), 
illustrating this phenomenon. In fact, the effective diffusivity S = <tK decays expo- 
nentially fast in <T as cr ^ 0. See |7 1 and the references therein. Thus the original and 
homogenized diffusivities are exponentially different at small temperatures. 
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To illustrate these facts explicitly, consider the problem in one dimension, d = 1. 
In this case the limiting equation takes the form 



dx{t) ^ ~AV'{x{t))dt + V^dp{t). (1.11) 
The effective coefficients are 

A=^^ and E = (1.12) 

zz zz 

where ^ ^ 

Z= f eP^y^^^dy, Z= f e-P'^y^/^dy. (1.13) 
Jo Jo 

Notice that < ZZ by the Cauchy-Schwarz inequality. This explicitly shows that 
the homogenized equation in one dimension comprises motion in the average potential 
V{x), at a new slower time-scale contracted hy A/a. 

The main results of the paper can b e su mmarized as follows. Assume that we are 
given a path {x*^ {t)}te [o.t] of equation (II. 6> and that we want to fit an SDE of the form 

Jl.l H to the given data, estimating the parameters A, E as A, E. Then the following is 
a loose statement of our main results; these will be formulated precisely, and proved, 
below. 

Theorem 1.1. If we do not subsample, then the estimators A and E are asymptotically 
biased - they converge to a, a. 
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Theorem 1.2. If the sampling rate is between the two characteristic time scales of the 

SDE ilA\ then the estimators A and S are asymptotically unbiased - they converge to 
A, S. 

The rest of the paper is organized as follows. In section|2]we present the estimators 
that we will use. In section we present various numerical experiments illustrating 
the behaviour of these estimators. In sectionl^we state the main results of this paper, 
explaining the numerical experiments from the previous section. Section 5 contains 
some preliminary results that will be useful in the sequel. Section 6 contains proof 
of two central propositions concerning the behaviour of the multiscale diffusion when 
observed on time-scales long compared with the fast time-scales of process, but small 
compared with the slow time-scales of the process. Section 7 is devoted to the proofs 
of our theorems. Finally, section|8]is devoted to some concluding remarks. 

In the sequel we use (•, •) to denote the standard inner-product on M"* and | • | 
the induced Euclidean norm. Throughout the paper we make the following standing 
assumptions on the drift vector fields: 

Assumptions 1.3. The potentials p and V satisfy: 

• p(2;)eCp~ (r,R'^); 

• V{x) e C°°(M'*,M); 

• \WV{xi)~WV{x2)\<L\xi-X2\ Vxi,a;2 e M-*; 

• 3a,b>0: (~Wix),x) < a - b\x\^ Vx G R'^; 

• e-f^(^) e Li(M'',M+). 

The third assumption will be used primarily to deduce that, by choice of origin for 

V, 

|VF(a;)| < L|a;|. (1.14) 

This assumption could be relaxed and replaced by a polynomial growth bound; how- 
ever this complicates the analysis without adding new insight. Similarly it is not nec- 
essary, of course, that V and p are C°°. The fourth condition, however, is essential: it 
drives the ergodicity of the process which we use in a fundamental way in the analysis 
of the drift parameter estimators; it would not, however, be fundamental for estimation 
of diffusion coefficients alone. The fourth condition implies the fifth, which is simply 
the requirement that the invariant measure is indeed a probability measure; we state the 
two conditions separately for clarity of exposition. 



2 The Estimators 

In th is section we describe various estimators for the parameters arising in equation 
J1.7> . We assume that we are given a path x = {a;(i)}tG[o,T]5 or samples from such a 

path, X = {a^nl^^O' ^^'-^ = x{n6). For simplicity we aim to fit the equation in the 
form 

dx{t) = -AS/V{x{t))dt + V2^d/3{t), (2.1) 

where A and E are scalars. In one dimension this reduces to the form ( 11.1 1> . Note that 
in general this is only the correct form for the homogenized equation in one dimension 
since, typically, the average potential has a matrix as a pre-factor, as in However 
it suffices to exemplify the main ideas in this work, and simplifies the presentation. 
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The standard way to estimate the diffusion coefficient is via the quadratic variation 
of the path: 

1 

'^N.six) = \xn+i - a;„p. (2.2) 

n=0 

A key issue in this pape r is t o understand how to choose 5 as a function of e to ensure 
that data generated by ilA\ ca n be effectively fit to obtain the correct homogenized 
diffusivity in equations such as ( 12. U . 

T he standard way to estimate drift coefficients is via the path-space HkeHhood of 
J2.H with respect to a pure diffusion with no drift, namely (see, for example, L4. .17J ) 

L{x) (X exp{-/(a;)/2S} 

where ^ 

/ {\A\7V{x{t))\^dt + 2A{S/V{x{t)),dx{t))} . 
Jo 

Maximizing the log-likelihood then gives the estimate yl of ^ given by 

^ _/>v-(xW).^xW) 

/„^|VV'(i(())|'<i! 

If the data is given in discrete but finely spaced increments, as often happens in practice, 
then this estimator can be approximated to yield 

An,s(x) = iTf—, 5 . (2.4) 

A key issue in this pap er is to understand how to chose S as a function of e to ensure 
that data generated by (II. 4> can be e ffect ively fit to obtain the c orrect homogenized 
drift coefficients in equations such as i Zl via the estimator ( 12.4k 

The gradient structure of the SDE ( 12. 1> can be used to obtain a second estimator for 
the drift coefficients. This second estimator, which we now derive, is of interest for two 
different reasons: firstly it may be useful in practice as it may lead to smaller variance 
in estimators; secondly it highlights the fact that working out how to sample the data 
to obtain the correct estimation of the diffusion coefficient alone will lead to correct 
estimation of the drift parameters, at least for the class of gradient-structure SDEs that 

we consider in this paper The second estimator requires the input of an estimator E 
for the diffusion coefficient and is 



^j;;^Av{x{t))dt 

Approximating to allow for the input of discrete-time data gives 



^"f ^^^""^f . (2.6) 
The following result shows that A{x) is a natural approximation to A{x). 
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Proposition 2.1. Letx — {a;(i)}te[o.T] satisfy dzTl . //S = Yj then the estimator A{x) 
is asymptotically equivalent to the maximum likelihood estimator A: 

lim A{x) — a.s. 

T — >oo 

Proo f. We apply the Ito formula to V{x{t)) for x{t) solving (I2.lt and use formula 
\23\ to obtain 



A{x) 



V{xm - V{xlT)) + S/„^ ^V{x{t)) dt 

J^\VV{xWdt 
{V{xm-V{x{T))) ^ ^J:J^ AV{x{t))dt 

j;;^\vv{x)\^dt ^j;^\vv{x)\^dt 

UV{x{0))-V{x{T))) 



Under the AssumDtions ll.3l it follows from 1 18] that 



A{x). 



um - — TP = 0, a.s. 



\WV{x{tWdt 

The result follows. □ 



3 Numerical Results 

In all cases we solve the multiscale SDE il.4\ using the Euler-Marayama scheme fl6] 
for a single realization of the noise, with a time-step At sufficiently small so that the 
error due to the discretization is negligible; this requires that the time-step is small 
compared with , the fastest scale in the problem. We also employ a sufficiently long 
time interval so that the invariant measure is well sampled by the single path. Since 
the convergence to the invariant measure is uniform in e — > 0, this is not prohibitive. 
We then use the data generated from the multiscale process as input to the estimators 
for the homogenized diffusion il.ll . We present numerical results for three model 
problems: a one dimensional monomial potential of even degree, a one dimensional 
bistable potential and a two dimensional quadratic potential. In all three cases we 
perturb the large-scale part of the potential V by small-scale fast oscillations, usually 
in the form of a cosine potential p. 

We present two types of numerical results. Note that S, the time interval between 
two consecutive observations, is the inverse sampling rate. In the first we use S — At as 
the time interval between two consecutive observations in the estimators. In the second 
we subsample the data, using S > At and study how the estimated coefficients behave 
as a functio n of the subsampling. We use the data generated from our s imulation in the 
estimators ( 12. 4> and i2.6\ to estimate the drift coefficient and in ( 12. 2t to estimate the 
diffusion coefficient of dl.l 1> . For the m ost pa rt we wor k in on e dimension and fit a sin- 
gle drift and diffusion parameter so that dl.Tt becomes dl.l 1> . When we work in more 
than one dimension, or estimate more than just a single drift or diffusion parameter, we 
use natural generalizations of the estimators defined in the previous section. 

Let us summarize the main conclusions that can be drawn from the numerical ex- 
periments; recall that At <C e^. First, if we choose S — At, that is, if we don't 
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Figure 2: Estimation of the drift and diffusion coefficients vs e for the potential ( 13. U . 
SoUd Une: estimated coefficient. Dashed line: homogenized coefficient. Dotted line: 
unhomogenized coefficient. 



subsample, then the resulting estimators do not generate the correct estimates of the 
homogenized coefficients. If, on the other hand, we subsample with <C (5 <C 0{1), 
then the estimators generate the values of the parameters of the homogenized equation. 
Furthermore, there is an optimal sampling rate: there exists a 6* which minimizes the 
distance between the homogenized value of the parameter and the value generated by 
the estimator. The optimal sampling rate depends sensitively on a. It is also of inter- 
est that, in higher dimensions, the optimal sampling rate can be different for different 
parameters. 

The above observations appear to hold independently of the detailed form of the 
large-scale part of the potential V (provided, of course, that it satisfies appropriate 
convexity conditions). In addition, the performance of the estimators seems to be the 
same irrespective of the dimension of the problem. 

A nother interesting observation is that the second estimator for t he dr ift coefficient 
J2.6t performs at least as well as the maximum likelihood estimator ( 12. 4> . and in some 
instances outperformas it. 



3.1 Failure Without Subsampling 

In this sectio n we study the estimators A and S when the data is given from the solution 
of equation (II. 6> with e <C 1 and At — S - no subsampling is used. We use the 
potential 




(3.1) 
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Figure 3: Estimation of the drift and diffusion coefficients vs a for the potential (13. 1> 
with e 0.1. SoUd line: estimated coefficient. Dashed line: homogenized coefficient. 
Dotted fine: unhomogenized coefficient. 



The small-scale part of the potential is 

p{y)^cos{y). (3.2) 

In Figure|2lwe plot the estimators A and S for various values of e. For comparison we 
also plot the homogenized coefficients A and S and the unhomogenized coefficients a 
and a. We o bserv e that the estimators always give us the coefficients a and cr of the 
original SDE (II. 6>. In particular, the performance of the estimators does not improve as 
e ^ 0. In Figure|3]we plot the estimators for various values of the diffusion coefficient 
a. We notice that the estimators give the values of the coefficients a and a, for all 
values of cr. Since the homogenized coefficients decay to exponentially fast in a, the 
results of Figure |3]indicate that the estimators give exponentially wrong results when 
cr < 1. 

These results indicate the need to subsample - i.e. to choose 6 appropriately as a 
function of e. 



3.2 Success With Subsampling 

Now, rather than using all the data that were generated from the solution of equa tion 



we use only a fraction of them. We choose S in the estimators (12. 2> . i2A\ and 
as follows: 

At.ara = (5 = 2'' At, /c = 0, 1, 2, . . . , 



1.4 

(Z6 



and we study the performance of the estimators as a function of the sampling rate. We 
investigate this issue for three different model problems. 
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Figure 4: Estimation of the drift and diffusion coefficients vs Atsam for the potential 
J3.U with e = 0.1. Solid line: estimated coefficient. Dashed line: homogenized 
coefficient. Dotted hne: unhomogenized coefficient. 



3.2.1 OU Processes in ID 

We stud y the problem in one dimension with the large-scale part o f the potential given 
by ( 13. 1> and with the fluctuating part being the cosine potential i3.2\ . The two esti- 
mators A and A for the drift coefficient produce almost identical results and we only 

present results for the maximum likelihood estimator A. In Figure |3 we present the 
estimated values of the drift and diffusion coefficients as a function of the inverse sam- 
pling rate S ~ Atsam when e = 0.1, a — 1.0, a — 0.5. We observe that, provided 
that we subsample at an appropriate rate, we are able to estimate the parameters of the 
homogenized equation correctly. Notice also that the estimators for the drift and the 
diffusion coefficient show very similar dependence on t he sampling rate. This is in 
accordance with our theoretical results; see Theorem l4.5l 

In Figure|5]we plot S as a function of the sampling rate for two different values of a. 
We observe that the estimator of the diffusion coefficient is a decreasing function of the 
sampling rate, as expected. In addition to this, there is a well defined optimal sampling 
rate, which depends sensitively on ct. In particular the optimal Sis a decreasing function 
of cr. This is to be expected, since when a ^ 1 the process x'^{t) loses its multiscale 
character and becomes effectively a standard Brownian motion. Consequently, when 
a is sufficiently large, the optimal S becomes At, the integration time step. Notice 

furthermore that the slope of the Y, — S curve depends on a. 

In Figure |6l we plot the estimators of the drift and diffusion coefficients versus cr, 
for three different sampling rates. For comparison we also plot the homogenized coef- 
ficients. We observe that all three sampling rates lead to reasonably accurate estimates 
for A and S, when a is not too small. On the other hand, the estimators become less 
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Figure 5: Estimation of the diffusion coefficient vs At sam for the potential i3.1\ with 
e = 0.1, for two different values of a. Solid line: estimated coefficient. Dashed line: 
homogenized coefficient. Dotted hne: unhomogenized coefficient. 



accurate as ct ^ 0. This is also to be expected: when cr <C 1, the accurate simulation 
of (II. 4> requires a very small time step; moreover, the equation has to be solved over 
a very long time interval in order for the invariant measure of the process to be well 
represented. Hence, our hypothesis that the errors due to discretization and finite time 
of integration are small, is not valid. In addition, as a tends to 0, the optimal sampling 
rate increases, and becomes much larger than the coarser sampling rate that we use in 
the simulations. 

In Figure we plot the estimators versus e, for three different values of the sam- 
pling rate. As expected, the deviation of the estimated values of the drift and diffusion 
coefficients from the homogenized values is an increasing function of e. On the other 
hand, the optimal sampling rate does not appear to depend sensitively on e: it is always 
the same sampling rate that minimizes the distance between the estimated coefficient 
and the homogenized one, for all values of e. 

3.2.2 A Bistable Potential 

We consider equation in one dimension with a mean potential of the bistable form 

V{x; a, (3) = -]^ax^ + ^Px'^ ■ (3.3) 
The fluctuating part of the potential is given by (13. 2> . The homogenized equation is 

dX{t) = {AX{t) - BX{tf)dt + V^df3{t), (3.4) 
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Figure 6: Estimation of the drift and diffusion coefficient vs a for the potential (13. U 
with e = 0.1, a — 1.0, for three different sampling rates. Solid line: Atsam = 
0.128. Dash-dotted line: Atsam = 0.256. Dotted line: Atgam = 0.512. Dashed Une: 
homogenized coefficient. 



where the homogenized coefficients are given by 

47r2 

A = aK, B ^ 3K, Y. = aK, K = 

ZZ 

where Z and Z are given by ( 11.131 1 wit h L — 2tt and p{y) = cos{y). We will estimate 
the diffusion coefficient using formula i2.2\ with d = 1. For the two parameters of the 

drift we use generalizations of the maximum likelihood estimator A. 

In Figures|8]and|9]we present the estimators for the two drift coefficients versus the 
sampling rate, for two different values of a. We observe that the performance of the 
estimators is qualitatively similar to the OU case. Notice also that the optimal sampling 
rate is approximately the same for both coefficients. 

In Figure^Jwe plot the estimator for the diffusion coefficient versus the sampling 
rate, for two different values of a. The conclusions reached from the numerical study 

of E for the one dimensional OU process carry almost verbatim to this case. 
3,2.3 A Quadratic Potential in 2D 

We Consider now in two dimensions with a separable fast potential p{y): 

dx'it) = -VV{x'{t),B)dt--Vpi J^^IM^ _ ivp2 (^^M^j dt + V2^d(3{t), 

(3.5) 
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Figure 7: Estimation of the drift and diffusion coefficient vs e for the potential ( 13. U 
with a = 1.0, (T = 0.5, for three different sampling rates. Solid line: ^tsam = 
0.128. Dash-dotted line; At^am = 0.256. Dotted line: Atgam = 0.512. Dashed line: 
homogenized coefficient. 



where B is the set of the drift parameters that we wish to estimate. The homogenized 
equation reads 



where 



and 



dX{t) = ~KW{X{t),B)dt + V2aKdl3{t), (3.6) 



K=( \ (3.7) 

\ 22^2 / 



Piivi) ^ Piivi) 

Zj = / e - dy,,, Z,= e " dy^, i = l,2. 
Jo Jo 

In the above L denotes the period of p{y). 

We will consider the case of a general quadratic potential in two dimensions: 

V{x,B) = ^x^Bx, (3.8) 

with B symmetric positive-definite. For the fluct uations we will use a simple two- 
dimensional extension of the cosine potential (|^}: 

PiiVi) = cos(yi), P2{y2) = ^ 003(2/2) ■ 
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Figure 8: Estimation of the parameters of the bistable potential ( 13. 3t as a function of 
the sampling rate for a — 0.5, e = 0.1. Solid line: estimated coefficient. Dashed line: 
homogenized coefficient. Dotted fine: unhomogenized coefficient. 



Our goal is to estimate the diffusion tensor and the drift coefficients. We will estimate 
the diffusion tensor through the quadratic variation: 

1 

T,N,s{x{t)) = ^ (Xn+l - Xn) <S) {Xn+1 - Xn), (3.9) 

where (g) stands for the tensor product. For simplicity we will assume that the diffusion 
tensor in our mode l is diagonal. This is consistent with the homogenized diffusion 
tensor, see eq. ( 13. 7> . We will use generalizations of the maximum likelihood estimator 

A in order to estimate the parameters of the quadratic potential. 

In Figure we present the estimated values of the two non-zero components of 
the diffusion tensor versus the sampling rate'. The performance of the estimator for 
the diffusion tensor is, qualitatively at least, similar to its performance in the one di- 
mensional problems considered in the previous two subsections. Notice, however, that 
the optimal sampling rate is quite different for the two non-zero components of the 
diffusion tensor 

In Figure [21 we present the estimated values of the four drift coefficients. The 
results are in accordance with the one dimensional theory developed in this paper, as 
well as with the numerical experiments shown in one dimension. We remark that the 
estimators capture successfully the fact that the homogenized matrix B is not symmet- 
ric. Notice furthermore that, as for the diffusion matrix, the optimal sampling rate is 
different for different components of the matrix B. 

'The estimated value of the off-diagonal elements is almost for all values of the sampling rate, in 
accordance with the theoretical result <3.7t . 
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Figure 9: Estimation of the parameters of the bistable potential ( 13. 3t as a function of 
the sampling rate for a = 0.7, e = 0.1. Solid line: estimated coefficient. Dashed line: 
homogenized coefficient. Dotted fine: unhomogenized coefficient. 



Thus, in this simple two dimensional multiscale model, the optimal sampling rate 
is different in different directions. This suggests that extreme care has to be taken when 
estimating parameters for multidimensional, multiscale stochastic processes. 



3.3 The Second Estimator for the Drift Coefficient 

In this section we compare between the performances of the two estimators for the 

drift coefficient, namely A and A g iven by equations i2Al and i2.6\ respectively. We 
estimate the drift parameter of ( II .4> in one dimension for a quartic and a sixth-degree 
large-scale potential V{x): 




(3.10) 



and 

V{x)^^ax^. (3.11) 

In both cases the small scale fluctuations are represented by the cosine potential i3.2\ 
In Figure [O] we present the estimated values of the drift coe fficien t as a function of 
the sampling rate for two different a for the quartic potential ( I3.10t . We also plot the 
effective and the unho mogen ized values of the drift coefficient. Similar results for the 
sixth-degree potential ( 13.1 1> are presented in Figure^^ In both cases we observe that 

the alternative estimator A performs better than A in this situation where the data is 
subsampled. 
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sam sam 



a. cr = 0.5 b. (T = 0.7 

Figure 10: Estimation of the diffusion coefficient for the bistable potential i3.3\ as a 
function of the sampling rate for a = 1.0, f3 — 2.0, e = 0.1. Solid line: estimated 
coefficient. Dashed line: homogenized coefficient. Dotted line: unhomogenized coef- 
ficient. 



4 Statement of Main Results 

In this section we pesent theorems which substantiate the numerical observations in 
the preceeding section. The first result shows that, without subsampling, the parameter 
estimators for the homogenized model will be asymptotically biased: they recover the 
parameters from the unhomogenized equations. 

Theorem 4.1. Let x'^{t) be the solution of J1.6> with x' ^ jO) distributed according to 
the invariant measure of the process. Then the estimator ( 12. 3t satisfies 

lim lim Aix'^) — a a.s. (4.1) 

Fix T = N6 in i2.2i . Then for every e > we have 

lim Yiis!j{x'^) = a a.s. (4.2) 

N^oo 

Now consider the one dimensional problem 

dx'it) = -aV'(x'(t))dt - ip' (^^^^ dt + V2^df3{t). (4.3) 

The next two results show that, with appropriate subsampling, the estimat ors re- 
cover the correct drift and diffusion coefficients fo r the homogenized model (11.1 1> 
when taking data from the unhomogenized equation ( 14. 3> . 
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Figure 11: Estimation of the non-zero elements of the diffusion tensor for the 2d 
quadratic potential ( 13. 8> as a function of the sampling rate for Bu — B12 ~ B21 = 
2, B22 = 3, (T = 0.5, e = 0.1. Solid line: estimated coefficient. Dashed line: homog- 
enized coefficient. Dotted line: unhomogenized coefficient. 



Theorem 4.2. Let x'^ (t) he the solution of (14. 3t with (0) distributed according to the 
invariant measure of the process. Further, let S — e", a € (0, 1) and N — [e"''] , 7 > 
a, where [■] denotes the integer part of a number. Then 

Van An six") = A in law, (4.4) 

e— >0 ' 

where A is given by ( I1.12K 

Theorem 4.3. Let x'^ (t) be the solution of ( 14. 3> with (0) distributed according to 
the invariant measure of the process. Fix T = NS with S — e" and a € (0, 1). Then 

lim Ejv 5(2;') = E inlaw, (4.5) 

where E is given by ( I1.12> . 

Remark 4.4. The two previous results require e/J — > e — > 0. /n view of the fact 
that the fast time-scale is ©(e^) (see equation (j^^Jj we might expect that this could 
relaxed to / 5 — > e — > 0. However we have not been able to prove this. See 
Remark \5.8\f or further discussion of this point. 

The final result concerns the second drift estimator and again concerns input of 
data from the unhomogenized equation ( 14. 3> into the paramter estimator for the ho- 
mogenized equation dl.lH . It requires an estimate of the diffusion coefficient, E. If 
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sam sam 
a. B21 b. B22 

Figure 12: Estimation of the parameters of the 2d quadratic potential ( 13. 8> as a function 
of the sampling rate for a = 0.5, e — 0.1. Solid line: estimated coefficient. Dashed 
line: homogenized coefficient. Dotted line: unhomogenized coefficient. 
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Figure 13: Estimation of the drift coefficients for the quartic potential O.lOt as a 
function of the sampling rate for e — 0.1. Solid line: A. Dash-dot line: A. Dashed 
line: homogenized coefficient. Dotted line: unhomogenized coefficient . 



E = cr, then we estimate the drift coefficient incorrectly with A(a;^); on the other hand, 

if S = E, then the estimator A(x'^^ gives the drift o f the homogenized equation. (To see 
the last result recall that A/E — a/a, see jl.l2h . Consequently, for multiscale gra- 
dient systems, it is sufficient only to subsample in a fashion which leads to the correct 
diffusion coefficient. This offers a clear computational advantage. 

Theorem 4.5. Let x'^ [t) be the solution of (14. 3t with x'^ (0) distributed according to 
the invariant measure of the process. Assume that the diffusion coefficient has been 

estimated to be E. Then 

E 

lim lim Aix'^) = — a inlaw. 

e^OT^oo cr 

5 Preliminary Results 

In this section we collect various results that will be used in the proof of our main 
theorems. We start by investigating some of the properties of the invariant measures of 
the unhomogenized and of the homogenized equation. We then introduce some tools 
useful in the study of homogenization for SDEs. 

Proposition 5.1. The invariant measure of the homogenized equation (II. 7> is the Gibbs 
measure 

^(dx) = = ^e-"^(=^)/'^da;, Z=[ e""^'^)/'^ dx. (5.1) 
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Figure 14: Estimation of the drift coefficients for the sixth-degree potential O.lll l as a 
function of the sampling rate for e = 0.1. Solid line: A. Dash-dotted line: A. Dashed 
line: homogenized coefficient. Dotted line: unhomogenized coefficient . 



The Markov process x{t) given by geometrically ergodic: there are C, A > 

such that, for every measurable f{x) satisfying 

1/(^)1 <i + i^r, 

for some integer p > 0, we have, for fi— a.e. ^(0), 



/ f{x)p{x)dx 



<C(l + |x(Of|) 



e 



-At 



where E denotes expectation with respect to Wiener measure. 

Proof. Assumptions together wit h the formulae for the effective drift and the ef- 
fective diffusion coef ficient, equation ( II. 8> . imply that the solution x{t) of the homog- 
enized e quati on (^2) has a unique invariant measure with smooth density. The Gibbs 
measure ( 15. U satisfies 

aSJVp + crVp = 

and hence 

K{aWp + aVp^ = 0. 
Because K is constant we deduce that 

aK\IVp + V • {uKp) = 0. 
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Thus 



V • (aKVVp + V • {aKp)'j = 0. 



This is the stationary Fokker-Planck equation for showing that the Gibbs measure 
p is indeed an invariant measure. For the geometric ergodicity we use 1 19 Thm 5.3]. 

□ 



Proposition 5.2. The invariant measure of the unhomogenized equation (II. 6> is the 

Gibbs measure 

^^(dx) =/7^(a;)dx = ^e-^^("'~^p(^), := / e^^^^^^-^Hf) rfx. (5.2) 

For every e > the Markov process il.6\ is geometrically ergodic: there are C, A > 
such that, for every measurable f{x) satisfying 

\f{^)\<l + \x\^, 
for some integer p > Owe have, for p'^—a.e. x'^(0). 



E/(x^(t))- / f{x)p%x)dx 



<C{l + \x%0)\P)e-^\ 

where E denotes expectation with respect to Wiener measure. 

Furthermore, the measure p'^ conv erges weakly to the invariant measure of the 
homogenized dynamics p given by ( 15.11 1. 



Proof. Assumptions 1 1 . 31 imply that x'^{t) is an ergodic Markov process. Direct calcu- 
lation with the Fokker-Planck equation shows that the unique invariant measure of the 
process is the Gibbs measure 



p^[x)dx = i-e-*^("'f'")dx 

Zj 



with given by ( 15. 2> . For the geometric ergodicity we use II19I Thm 5.3]. 
Now let 

Since M(a;,?/) G Li(M'*; Cper(T'^)), by |8 Lem. 9.1] we have that 



In particular, since 1 £ L 



e 

Qofnd 



/ u{-,y)dy, weakly in ii(R'^). 



We combine the above two results to conclude that 

^ ^e~^^(^\ weakly in 

where Z is given by (15. 1> . The weak convergence of the densities in L^(M'^) implies 
the weak convergence of the corresponding probability measures. □ 
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Remark 5.3. The assumption of stationarity of the process x'^(t) is not necessary for 
the proof of the above theorems and is only made for simplicity. Indeed, in the next 
section we prove that x'^ (t) is geometrically ergodic and consequently it converges to 
its invariant distribution exponentially fast for arbitrary initial conditions. Further- 
more, the fact that the invariant measure of the process x'^ (t) converges weakly, as 
e —^ 0, to the invariant measure of the homogenized process is important for us as 
many of our results will be deduced by taking expe ctations with respect to the invariant 
measure fi'^ (dx) of the multiscale dynamics J1.6> . The weak convergence alluded to 
demonstrates that the measure /i*^ behaves uniformly in e ^ 0. 

An immediate corollary of the above proposition is that x^{t) has bounded mo- 
ments of all orders. We w ill us e the notation to denote expectation with respect to 
the stationary measur e of il.3\ on path space, when initial data is distributed according 
to the Gibbs measure i5.2\ . 

Corollary 5.4. Let x'^( t) be the solution of (II .4t with the potential given by il.5\ and 
assume that conditions il.3\ are satisfied. Assume furthermore that (0) is distributed 
according to p,'^. Then, for all p > 1, there is a constant C = C{P,T) uniform in 
e ^ 0, such that 

W\x''{t)Y'<C Vte[0,T]. 

It is convenient for the subsequent analysis to introduce the auxiliary variable 



fit) 

We can then write equation ( II. 6> in the form 



x^{t) 



dx^t) = -a\/V{x'{t))dt- -"s/piai^t)) dt + V2adp{t) 



(5.3a) 



dy'{t) = ^-aS/V{x'{t))dt- \s/p{y%t)) dt + J ^d/3{t). (5.3b) 



Notice that both processes x'^{t) and y^(i)are driven by the same Brownian motion. 
Written in this fashion it is clear that we are in a situation where homogenization ap- 
plies. The homogenized equation is found by elimi nating y"^ (t) from the scale separated 
system for {x'^ (t) , (t)}. Note that Co defined in (I1.10> is the generator of the process 



dy{t) = - Vp {y{t)) dt + V2a d(i{t), 

on the unit torus, which governs the dynamics of yl to leading order in e. The generator 
of the joint process {x^{t), yl} reads 



1 



1 



— -itCq H — Ci + c 



where 



Co = -VyPiy) ■ Vy + a Ay, 

Ci = -VyPiy) ■ - aV^Vix) ■ Vy - 
C2 = ~a\7^V{x) -S/^ + aA^. 

The following result can be found in, e.g. Ch. 3]. 



2crVr • V 
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Lemma 5.5. Assume that p{y) e C^^^iJ'^.M) and that H (jj) e C^^^{T'^ Let 
n(dy) be the Gibbs measure M.9\ and assume that H{y) is centered with respect to 
nldy): 



■fd 



H{y)fi{dy)^0. 



(5.4) 



Then the Poisson equation 



-Cox = H{y), 



(5.5) 

has a unique mean-zero solution in Lp^j.{T'^, ix{dy) \ R'^). This solution, together with 
all its derivatives, is bounded. 

We will need an estimate on integrals whose integrand is centered with respect to 
the invariant measure ^{dy). 

Lemma 5.6. Let H{y) g Cgg^ (T''; R'^) satisfy condition ( l54l . Assume that x^(0) is 
distributed according to (15. 2t . Then the following estimate holds for any p > 1 and 
T > 0: 



Hiy'is))ds 



< C (e^P + ePTP + ePTi^ 



Proof Consid er th e Poisso n equ ation i5.5\ with periodic boundary conditions. Since 
H{y) satisfies i5.4\ . Lemma lsTsl applies and we have that xiu) is smooth and bounded, 
together with all i ts der ivatives. We no w apply the Ito formula to xiu^ (t)), where y^ (t) 
is the solution of ( I5.3b> . and use i5.5\ to obtain 



H{y^{s)) ds = -e^ (x(y^(T)) - xifiO))) 



eV2a 







Vyxiy'is)), d(3{s))-ae / {VV{x'{s)),Vx{y'{s)))ds. 



Now, using the boundedness of x, we have, for 



I{T) := W 



H{y'{s))ds, 







I{T) < C e^P + eP& 



\WV{x'{s))\ds 



[VyXiy'is)), dPis)) 

\^yx{y'is))r dA 



from which the desired estimate follows. In deriving the above we used the estimate 
115 Eqn. 3.25, p. 163] on moments of stochastic integrals. □ 



< C (^e^P + ePTP'^ \x%s)\Pds + ePT^-^ 



< c(e^P + tPTP + tPTi 
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For the rest of this section we will restrict ourselves to the one dimensional case. If 
we apply Ito formula to 0(y^(s)), the solution of the Poisson equation ( II.IOK then we 
obtain 

{n+l)5 

V'ix%s)){l + dy^ifis)))ds (5.6) 
+V2^ (l + 9^(/.(2/^(s)))d/3(s) 

JnS 

-e{<j>{y'{{n + l)5j)-cl,{f{nS))). (5.7) 

The proof Theorems B31 and l4.3l is based on careful asymptotic analysis of the behavior 
of Xj^^i—x'^^ given by this formula when both e and d are small. Specifically we will use 
the following two propositions. They show how the effective homogenized behaviour 
is manifest in the time-S Markov chain induced by sampling the path x'^ (t) from il.6\ . 

Proposition 5.7. For e, S > sufficiently small and n G N there exists an i.i.d. se- 
quence of random variables f„ £ A/'(0, 1) such that 

V2^ (1 + dyc^iy' is))) d(3{s) = V2^ in + Ri (S, e) (5.8) 

JnS 

in law. The remainder Ri{S, e) satisfies, for every /3 € (0, 5) andp > 0, the estimate 

(E^>i(e,<5)|'')'^''<C (£2/3 + 6/3), (5.9) 
where C is independent of e and 5. 

Remark 5.8. Estimate (|^^ is almost certainly not optimal. Indeed, informal calcula- 
tions lead us to expect the estimate 

(e^' |i?i(6, <5)|^) ^'^ < C (e2/3 + e^S^ + ^^^6^). 
However, we have not been able to prove this. 

Proposition 5.9. For e, (5 > sufficiently small and n e N we have that 

a V'{x'{s)){l + dy(j){y'{s)))ds^ A5V'{xl,)+R2{e,5) (5.10) 

JnS 

in law. The remainder R2 {S, e) satisfies, for every p > 0, the estimate 

(e>^' \R2ie,S)\''y^' <c[e^ + Sh + S^/') , (5.11) 
where C independent of e and 6. 



6 Proof of Propositions 15.71 and 15.91 

In this section we prove the two propositions l5 .7l and l5 .91 These are central to the proof 
of the two theorems concerning the behaviour of the estimators with subsampled data. 
We start with a rough estimate on x'^_^_l — that we will need for the proofs of the 
propositions. 
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6.1 A Rough Estimate 

Lemma 6.1. Let Assumptions ^ .3\ hold and assume that x'^{t), the solution of J4.3I I. is 
stationary. Then there exists a constant C, independent of 5 and e, such that 

\x'[s) - <c(5P + 5^ + , (6.1) 

for every s G {nS, {n + 1)5] and every p > 1. 

Proof. Using the same derivation that leads to ( 15. 7K but with (n + 1)6 replaced by s, 
we have: 



x'{s)-xl - -a V'{x'is))il + dy<Piy%s)))ds + V2a (1 + 9y</)(y^(s))) d/3(s) 

'J nS J 7iS 

-e{cj,{y^{s))-ct>{y'{nS))) 
= ■■ lis + lis + lis- (6.2) 

We need to estimate the terms in J6.2> . We start with g. By Lemma l531 we have 

mv)\\L^ < c. 

Consequently 

W'\ll,\P<CeK 
To estimate ll^ g we use again Lemma l531 to conclude that 

\\l + dy^iy)\\L^ <C. (6.3) 



The above estimate, together with Assumptions 1 1.31 Corollarv 15.41 and the stationarity 
of the process x'^{t), give 

r{n+l)S 

E'^V^^^r < CSP-^ / Ei'^\V'ix'is))\Pds 



J nS 

An+l)S 

< CSP-^ / W'\x'{s)\Pds 

JnS 

< C6P. 



Estimate [15' Eqn. 3.25, p. 163] on moments of stochastic integrals, together with 
equation ( 16. 3> . enable us to conclude that 

l-(n+l}S 

E^'lllsl" < CSi~' Ef^'ll + dycb{y^{s))\''ds 

JnS 

We combine the above estimates to obtain (16. U. □ 
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6.2 Proof of Proposition 15.71 

From Theorem II13I Sec. 1.3], 1151 Thm. 3.4.6] we know that the martingale 



M{t) := V2(7 / {I + dy<j){yl)) ds 
Jo 

is equal in law to a time-changed Brownian motion, 

M{t)=p[2a {l^dy^{y\s))f ds 

Also the quadratic variation satisfies 

{M)t = 2a f (1 + dy^{y'{s))f ds » 2Et. 
Jo 

Indeed 



{M)t = 2crW I il + dy(l3iy'{s))y ds 
= 2St, 



where the last equality follows from equation ( II. 8t for d = 1. Using these observations 
we write 



J n 



'2a 



(n+l)<5 
(5 

(n+l)<5 



(l + 5,0(y^(s))) d(5{s) 



= V2a (1 + d/3(s) - V2a / (1 + dycl,{y'{s))) dp{s) 

Jo Jo 

= ^(2E(n+l)(5)-/3(2En(5)+r„+i-r„ 
= V2S5^„ + r„+i - r„, 

where the ^„ are i.i.d unit Gaussian random variables and 

r„ = dm)ns) - /3(2Sn,5). 

To estimate this difference we follow the proof of |14" Thm. 2.1]. We start by 
employing the Holder continuity of Brownian motion, together with Holder inequality, 
to estimate: 



(3{{M)nS)- P{E^''(M)ns) 



Hoi 



((M)„5-E^^ (M) nS 



E'" 



< c Ie^"' 




f3q\ 


/ Hiy^{z))dz 






Jo 
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with /? G (O, We have used the notation 



H{y) :=2a(l + 5j,0(y))'~2S. 

We have also used the fact that, for every [3 e (O, j) and every bounded time interval, 
the /^-Holder exponent of Brownian motion is uniformly bounded with probability one. 
We have that 

' H{y) f,{dy) = 0, 

where ^{dy) is defined in ( II. 9> . Since nS < T, Lemma l5^ applies and we have that, 
for q sufficiently large and for e sufficiently small. 



EM 



J n 



This completes the proof of the proposition. 



□ 



6.3 Proof of Proposition 15.91 

We have 



\R2{e,6)\P = 



(n+l)<5 
(n+l)<5 



aV'ix^is)) (1 + dycPiy^is))) ds - SAV'{x^^s) 

{n+l)5 



5 JnS 



{n+l)S 
nS 



(v'{x^{s))-V'{xl,s)){l + dyct>{y'{s)) 



ds 



< CEf" 



{n+l)5 

/ ia{l + dy^iy%s)))-A)ds 

s 



+aPCEP 



(n+l)<5 

.5 



{v'{x^s))~V'{x:,,)){l + dy^{y^{s)) 



ds 



where the constant C depends only on p. We use the Holder inequality. Assumptions 
1.31 Lemma lOl and the uniform bound on dy(t){y) to obtain, for e, 5 sufficiently small. 



E^^ |x'(s)-<,r ds 
{5^ + tP) ds 







j-{n+l)S 


Ih < 


C5P-^ 






InS 
j-(n+l)S 


< 






< 




+ SPeP'^ 
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Consequently 



Consider now the function 

Hiy) a (1 + 9,0(2/)) -A, 
From the definition of A we get that 



(6.4) 



(a(l + aj,0(y))-A)/i(dy) = 0. 



Hence, Lemma l576l aDplies and we get 

f.(n+l)5 



{a{l + dy^{y'{s)))~A) ds 



< C (e^P + ePSP + ePSP^^ 



We combine the above estimate with ( I1.14> and Corollary 15. 4l to obtain, 



(6.5) 



for e. S sufficiently small. The proof of the proposition follows from estimates (I6.4> 
and (|63}. □ 



7 Proof of Main Theorems 

Here we combine the results from the preceding two sections to complete the proofs of 
the main theorems. 



7.1 Proof of Theorem lO 

We combine equations ( 12. 3> and ( II. 6> to calculate 



lo {-^'^(^'(t))' -aVV{x'{t)) dt - iVp dt + y/2^dp{t) 

' lo (vV(x-(0), Vp(^)) dt _^^ jT (^yv^^e^t)),dp{t)) 

a + /i(T,6)-/2(T,e). 

We will treat the terms Ii{T, e) and l2{T, e) separately. We start with l2{t, e). Since 
the stochastic integral 

Mt:^ [ {VV{x'{t)),d(3{t)) 



a + 
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is a continuous martingale which is null at 0, the strong law of large numbers for 
martingales [23. p. 187] applies and we have that 

lim - — ^ — = a.s. 

T^ + oo (M)t 

Consequently 

lim /2(r,e)=0 a.s. (7.1) 
Let us consider now the term Ii (T, e). We use the ergodic theorem to deduce that 



ds 



lim /i(r, e) = lim ^ 



T , 

((vy(x),ivp(f))) 



a.s. 



E'^'|VF(x)|2 
Now we use Proposition l5.2l to compute 

E/-^ ((VF(x),lVp(f))) ^ 4,(VT/(a:),lVp(f))p-(x)dx 
Et''\VV{x)\^ E'^'|Vt/(a;)|2 

EA''|VF(2;)|2 

E^'(AF(a;)) 
'^E'^'|VV^(x)|2 

In deriving the penultimate line we used a n int egration b y pa rts. The weak conver- 
gence of /i*^ to fi (second part of Proposition l5.2> . formula ( 15. 1> . together with another 
integration by parts give 

E'''(AV(x)) E''(AV(x)) 

lim 



6 E'^^(|VF(a;)|2) E'^(|VF(a;)|2) 

E''(AF(x)) 



a 



a 

We combine the above calculations to conclude that 



lim lim /i(T,e) = a.s. (7.2) 

e— >0 T^oo 



The proof of the conve rgen ce of the maximum likelihood estimator, eqn. ( 14. 1> now 
follows from equations (17. 2> and (17. 1> . 

T he proof of the convergence of the estimator for the diffusion coefficient, eqn. 
( I4.2> . follows from the definition of the quadratic variation, see e.g. |4l. □ 



Remark 7.1. An immediate corollary of the proof of the above theorem is that 

^ E^'iAVjx)) 



29 



7.2 Proof of Theorem lO 

We combine Proposition l5.9l and iS.li to conclude that 

xl+,-xl^Jn~A6V'{x:j+R{e,S), 

where J„ is as defined in the proof of Proposition 15 . 71 and, for e, S sufficiently small 

and a e (0, 1), 

(Et''\R{e,6)\Py^^ < C((53/2 + e). ^7 3^ 

Notice that 

We combine this with formula (|^} to obtain 

^N.5(X ) = A -TT— Tj— ; 

■= A-h-h, (7.4) 
We need to control the terms Ii and I2 ■ We start with Ji , which we rewrite in the form 



The central hmit theorem for (discrete) martingales implies that 

= ^7V(0,c(5) = c7V(0,l) inlaw, 

for some c uniform in e 0. In the above we have used the fact that E'' | Jq P = 2T,S. 
On the other hand, the ergodic theorem implies that 



N-l 

lim -Y,\V'{x:,)f^E^'\V{x)\\ a.s. (7.5) 



Hence, by Slutsky's theorem, and remembering that N = [e we have that 

lim /i = in law. (7.6) 

Consider now the term l2- It can be written as 
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The ergodic theorem impHes that the denominator in the above expression converges 
a.s. to a finite value. To study the numerator of the above expression we use estimate 
J7.3> . together with Holder inequality to estimate 



n=0 



N-l 



■ 1/9 / , , , , \ 1/P 



" ^ (E^>'(x^)r) [w-\R{e,5)\^ 



i/p 



In the above we have used Corollarv 15.41 together with Assumptions 1 1.31 The above 
calculation shows that numerator of I2 converges to in L^, and hence in law. This, 
together with the a.s. convergence of the denominator and Slutsky's theorem gives 

lim I2 = in law. (7.7) 

e— »0 

Combining ilA\ . j7.6> and il.li completes the proof of the theorem. □ 

7.3 Proof of Theorem lO 

We combine Proposition l5.7l with (|^} to write the difference xf^^^ — in the form 



<+i - < = + R{S, e) (7.8) 

in law, where, for e, 6 sufficiently small, 

(E'''\R{e,S)\Py^'' <C{S + e^) . (7.9) 
We substitute J7.8> into the formula for the estimator (12. 2> with c? = 1 to obtain 

N-l N-1 2 

^M^n = E- ^ + — ^ (m e)) V2S^6.i?(<5, e) 

n—0 n— n— 

N-l 



N 

n=0 



By the law of large numbers the first term tends almost surely to S as e ^ (which 
implies ^ 00.) Thus it suffices to show that the remaining terms tend to zero in law. 
We do this by showing that they tend to zero in L^. 
Note that 



JV-l 



n=0 
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for a G (0, 1), since (3 can be chosen arbitrarily close to i . 
Similarly 

N-l 

= C(et +e/3-f ) 

=0(1), 

for a €E (0, 1), since j3 can be chosen arbitrarily close to i. This completes the proof. 

□ 



7.4 Proof of Theorem lO 

Taking the limit T ^ cx) in ( 12. 5t gives 

lim Afx*^) = £ „ , f„ — r-rri. 

T^oo ^ ^ Et''-\VV{x)\^ 

Proposition l5.2l imDlies that 



^0 E'^^|VF(a;)|2 E^^ |VF(a;)|2 ' 

where E** denotes expectation with res pect to the invariant distribution p{x) of the 
homogenized process, given by formula ( 15. U . An integration by parts now gives that 

E^|Vy(a;)p = -Ef'iAVix)). 
a 

Thus, the final result of our considerations is that 

lim lim A(x'^) = — a. 

e^OT^oo a 

□ 



8 Conclusions and Future Work 

The problem of parameter estimation for continuous time multiscale diffusion pro- 
cesses is studied in this paper Our goal is to accurately fit a homogenized equation 
from data which has a multiscale character Our main conclusions are as follows: 

• In order to estimate the drift and diffusion coefficients accurately it is necessary 
to subsample. 

• There is an optimal subsampling rate, between the two charateristic time-scales 
of the multiscale data. 
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• The optimal subsampling rate may differ for different parameters. 

• For gradient multiscale systems it is only necessary to estimate the diffusion 
coefficient corr ectly, if o ne uses the second estimator for the drift - A, defined in 
equations i2.5\ and ( 12. 6> . 

Both analysis and numerics are given to substantiate these claims. Many open 
questions remain; we list those which seem important to us. 

• Rough heuristics indicate that any subsampling rate which is between the two 
characteristic time scales of the processes, namely 0{e^) and 0(1), should en- 
able accurate estimation of the drift and diffusion coefficients. However our anal- 
ysis works only in the case where the subsampling is between 0{e) and 0(1). 
Closing the gap between intuition and what can be proved would be valuable. 

• Analyze other parameter estimation problems for multiscale diffusions, not nec- 
essarily of gradient form. In particular study both averaging and homogenization 
set-ups, as outlined in the introductory section. 

• In this paper we have generated simulated multiscale data by using a multiscale 
diffusion process. However this was done to provide a convenient analytical 
framework. In applications it is of interest to develop tools for characterizing 
the multiscale structure of a given path - to estimate characteristic time-scales. 
Related work has been done in 1 1 1 1. Further study would be of interest. 

• Determine precisely the range of subsamplings which will give accurate param- 
eter estimates and optimize the subsampling rate for accuracy. 

• Optimize the algorithm by combining estimates based on shifts of the subsam- 
pled data - so that information is not thrown away; this is done in the context of 
econometrics and finance in |[2|2|. 

• Analyze questions analogous to those raised here for multidimensional multi- 
scale processes. 

• Analyze questions analogous to those raised here for hypoelliptic multiscale dif- 
fusions; in particular the case where the homogenized equation is a fully elliptic 
first order Langevin equation which is derived from an overdamped second-order 
Langevin equation. 

• Study whether there is any advantage in using random subsampling rates. 

• Study drift that depends non-linearly on the parameters to be estimated: 

dx'it) = -\7V{x''{t),e;a)dt + V2^dp{t). 

• Parameter estimation for deterministic multiscale problems where the fast pro- 
cess is a strongly mixing chaotic deterministic process. 
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